klotz: computer vision*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. The paper introduces LeWorldModel (LeWM), a stable Joint-Embedding Predictive Architecture (JEPA) that trains end-to-end directly from raw pixels. Unlike existing methods that rely on complex losses, pre-trained encoders, or auxiliary supervision to prevent representation collapse, LeWM uses only two loss terms: next-embedding prediction and Gaussian latent regularization. This approach significantly simplifies the training process by reducing tunable hyperparameters. The model is highly efficient, with approximately 15 million parameters capable of being trained on a single GPU within hours, and it offers planning speeds up to 48x faster than foundation-model-based world models while remaining competitive in 2D and 3D control tasks. Additionally, the latent space effectively encodes physical structures, allowing the model to detect physically implausible events through surprise evaluation.
  2. The M.2 Max is an AI inference acceleration card powered by the Metis AIPU, designed to enable Large Language Models (LLMs) and Vision Language Models (VLMs) on power-constrained edge and embedded devices. It offers high memory performance in a small footprint and supports complex computer vision tasks using parallel or cascaded models.
    Key features include:
    - Memory capacities up to 16 GB with various cooling options.
    - Support for standard and extended operating temperature ranges.
    - Hardware Root-of-Trust for secure boot and firmware integrity.
    - Integration via the Voyager SDK and advanced quantization tools.
    - Compatibility with PCIe Gen. 3.0 x4, Intel, AMD, and Arm64 processors across Linux and Windows environments.
  3. A technical guide to running lightweight OCR models (LightOnOCR, GLM-OCR, Deepseek-OCR) on low-end hardware using llama.cpp. Includes implementation details for CLI, REST APIs, and performance optimization.

    Topics Covered:

    - llama.cpp OCR integration
    - Low-spec hardware optimization
    - CLI & REST API setup
    - Quantization & Prompting
    - Hallucination mitigation
  4. This is an open, unconventional textbook covering mathematics, computing, and artificial intelligence from foundational principles. It's designed for practitioners seeking a deep understanding, moving beyond exam preparation and focusing on real-world application. The author, drawing from years of experience in AI/ML, has compiled notes that prioritize intuition, context, and clear explanations, avoiding dense notation and outdated material.
    The compendium covers a broad range of topics, from vectors and matrices to machine learning, computer vision, and multimodal learning, with future chapters planned for areas like data structures and AI inference.
  5. Sipeed’s MaixCAM2 is a powerful, open-source AI camera designed for makers, offering significant performance improvements over Raspberry Pi and OpenMV solutions. It features the Axera Tech AX630 AI SoC with up to 12.8 TOPS and supports training-free vision models and vision-language models.
  6. Introduction to the OSOYOO V4.0 Robot Car for Raspberry Pi, highlighting its advanced features and capabilities for complex robotic projects compared to Arduino-based kits.
  7. Moondream transforms the humble Raspberry Pi into a context-aware visual interpreter, capable of answering nuanced questions about images in plain English. This guide explores its potential for home automation, security analysis, and more.
  8. This book covers foundational topics within computer vision, with an image processing and machine learning perspective. It aims to build the reader’s intuition through visualizations and is intended for undergraduate and graduate students, as well as experienced practitioners.
  9. Creativity and a Jetson Orin Nano Super can help hobbyists build accessible robots that can reason and interact with the world. The article discusses building a robot using accessible hardware like Arduino and Raspberry Pi, eventually upgrading to more capable hardware like the Jetson Orin Nano Super to run a large language model (LLM) onboard.
  10. Learn how to use Python and OpenCV to perform face detection and recognition. This tutorial also covers concepts like bounding boxes, intersection over union (IoU), and grayscale conversion.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: computer vision

About - Propulsed by SemanticScuttle